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Abstract 

In a sampling problem, we are given an input x € {0, 1}™, and asked to sample approximately 
from a probability distribution T> x over poly (n)-bit strings. In a search problem, we are given an 
input x £ {0, 1}", and asked to find a member of a nonempty set A x with high probability. (An 
example is finding a Nash equilibrium.) In this paper, we use tools from Kolmogorov complexity 
and algorithmic information theory to show that sampling and search problems are essentially 
equivalent. More precisely, for any sampling problem S, there exists a search problem Rg such 
that, if C is any "reasonable" complexity class, then R$ is in the search version of C if and only 
if S is in the sampling version. 

As one application, we show that SampP = SampBQP if and only if FBPP = FBQP: in 
other words, classical computers can efficiently sample the output distribution of every quantum 
circuit, if and only if they can efficiently solve every search problem that quantum computers 
can solve. A second application is that, assuming a plausible conjecture, there exists a search 
problem R that can be solved using a simple linear-optics experiment, but that cannot be solved 
efficiently by a classical computer unless the polynomial hierarchy collapses. That application 
will be described in a forthcoming paper with Alex Arkhipov on the computational complexity 
of linear optics. 

1 Introduction 

The Extended Church-Turing Thesis (ECT) says that all computational problems that are feasibly 
solvable in the physical world are feasibly solvable by a probabilistic Turing machine. By now, 
there have been almost two decades of discussion about this thesis, and the challenge that quantum 
computing poses to it. This paper is about a related question that has attracted surprisingly little 
interest: namely, what exactly should we understand the ECT to state? When we say "all 
computational problems," do we mean decision problems? promise problems? search problems? 
sampling problems? possibly other types of problems? Could the ECT hold for some of these types 
of problems but fail for others? 

Our main result is an equivalence between sampling and search problems: the ECT holds for 
one type of problem if and only if it holds for the other. As a motivating example, we will prove the 
surprising fact that, if classical computers can efficiently solve any search problem that quantum 
computers can solve, then they can also approximately sample the output distribution of any 
quantum circuit. The proof makes essential use of Kolmogorov complexity. The technical tools 
that we will use are standard ones in the algorithmic information theory literature; our contribution 
is simply to apply those tools to obtain a useful equivalence principle in complexity theory that 
seems not to have been known before. 
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While the motivation for our equivalence theorem came from quantum computing, we wish to 
stress that the theorem itself is much more general, and has nothing to do with quantum computing 
in particular. Throughout this paper, we will use the names of quantum complexity classes — such 
as BQP (Bounded-Error Quantum Polynomial-Time), the class of languages efficiently decidable 
by a quantum algorithm — but only as "black boxes." No familiarity whatsoever with quantum 
computing is needed. 

The rest of the paper is organized as follows. Section fl.ll contains a general discussion of the 
relationships among decision problems, promise problems, search problems, and sampling problems; 
it can be safely skipped by readers already familiar with this material. Section \1 .21 states our main 
result, as well as its implications for quantum computing in general and linear-optics experiments in 
particular. Section fl.31 explains how Kolmogorov complexity is used to prove the main result, and 
situates the result in the context of earlier work on Kolmogorov complexity. Next, in Section [21 
we review some needed definitions and results from complexity theory (in Section \2.1h . algorithmic 
information theory (in Section f2 . 2 [) . and "standard" information theory (in Section \2.3\i . We prove 
the main result in Section [3j and the example application to quantum computing in Section 13.11 
Finally, in Section [H we present several extensions and generalizations of the main result, which 
address various shortcomings of it. Section |4] also discusses numerous open problems. 

1.1 Background 

Theoretical computer science has traditionally focused on language decision problems, where given 
a language L C {0, 1}*, the goal is to decide whether x G L for any input x. From this perspective, 
asking whether quantum computing contradicts the ECT is tantamount to asking: 

Problem 1 Does BPP = BQP? 

However, one can also consider promise problems, where the goal is to accept all inputs in a 
set Lyes ^ {0,1}* and reject all inputs in another set Lno Q {0,1}*. Here Lyes an d Lno are 
disjoint, but their union is not necessarily all strings, and an algorithm can do whatever it likes 
on inputs not in Lyes U Lno • Goldreich [5] has made a strong case that promise problems are at 
least as fundamental as language decision problems, if not more so. To give one relevant example, 
the task 

Given a quantum circuit C , estimate the probability p (C) that C accepts 

is easy to formulate as a promise problem, but has no known formulation as a language decision 
problem. The reason is the usual "knife-edge" issue: given any probability p* € [0, 1] and error 
bound e > l/poly(n), we can ask a simulation algorithm to accept all quantum circuits C such 
that p (C) > p* +e, and to reject all circuits C such that p (C) < p* — e. But we cannot reasonably 
ask an algorithm to decide whether p (C) = p* + 2~ n or p (C) = p* — 2~ n : if p (C) is too close to 
p*, then the algorithm's behavior is unknown. 

Let PromiseBPP and PromiseBQP be the classes of promise problems solvable by probabilistic 
and quantum computers respectively, in polynomial time and with bounded probability of error. 
Then a second way to ask whether quantum mechanics contradicts the ECT is to ask: 

Problem 2 Does PromiseBPP = PromiseBQP? 
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Now, if one accepts replacing languages by promise problems, then there seems little reason not 
to go further. One can also consider search problems, where given an input x £ {0, l} n , the goal is 
to output any element of some nonempty "solution set" A x C {0, l} poly ( n ). (Search problems are 
also called "relational problems," for the historical reason that one can define such a problem using 
a binary relation R C {0, 1}* x {0, 1}*, with (x, y) E R if and only if y £ A x . Another name often 
used is "function problems." But that is inaccurate, since the desired output is not a function of 
the input, except in the special case \A X \ = 1. We find "search problems" to be the clearest name, 
and will use it throughout in the hope that it sticks. The one important point to remember is 
that a search problem need not be an NP search problem: that is, solutions need not be efficiently 
verifiable.) 

Perhaps the most famous example of a search problem is finding a Nash equilibrium, which 
Daskalakis et al. [3] showed to be complete for the class PPAD. By Nash's Theorem, every game 
has at least one Nash equilibrium, but the problem of finding one has no known formulation as 
either a language decision problem or a promise problem. 

Let FBPP and FBQP be the classes of search problems solvable by probabilistic and quantum 
computers respectively, with success probability 1 — 5, in time polynomial in n and 1/50 Then a 
third version of the "ECT question" is: 



Problem 3 Does FBPP = FBQP? 



There is yet another important type of problem in theoretical computer science. These are 
sampling problems, where given an input x £ {0, 1}™, the goal is to sample (exactly or, more 
often, approximately) from some probability distribution T> x over poly (n)-bit strings. Well-known 
examples of sampling problems include sampling a random point in a high-dimensional convex body 
and sampling a random matching in a bipartite graph. 

Let SampP and SampBQP be the classes of sampling problems that are solvable by proba- 
bilistic and quantum computers respectively, to within e error in total variation distance, in time 
polynomial in n and Then a fourth version of our question is: 

Problem 4 Does SampP = SampBQP? 

Not surprisingly, all of the above questions are open. But we can ask an obvious meta-question: 



What is the relationship among Problems U§4\? If the ECT fails in one sense, must it 
fail in the other senses as well? 

In one direction, there are some easy implications: 

SampP = SampBQP FBPP = FBQP 

=^> PromiseBPP = PromiseBQP 
=> BPP = BQP. 



lr The F in FBPP and FBQP stands for "function problem." Here we are following the standard naming convention, 
even though the term "function problem" is misleading for the reason pointed out earlier. 

2 Note that we write SampP instead of "SampBPP" because there is no chance of confusion here. Unlike with deci- 
sion, promise, and relation problems, with sampling problems it does not even make sense to talk about deterministic 
algorithms. 
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For the first implication, if every quantumly samplable distribution were also classically samplable, 
then given a quantum algorithm Q solving a search problem R, we could approximate Q's output 
distribution using a classical computer, and thereby solve R classically as well. For the second and 
third implications, every promise problem is also a search problem (with solution set A x C {0, 1}), 
and every language decision problem is also a promise problem (with the empty promise). 

So the interesting part concerns the possible implications in the "other" direction. For example, 
could it be the case that BPP = BQP, yet PromiseBPP / PromiseBQP? Not only is this a formal 
possibility, but it does not even seem absurd, when we consider that 

(1) the existing candidates for languages in BQP \ BPP (for example, decision versions of the 
factoring and discrete log problems [8]) are all extremely "special" in nature, but 

(2) PromiseBQP contains the "general" problem of estimating the acceptance probability of an 
arbitrary quantum circuit. 

A second example of a difficult and unsolved meta-question is whether PromiseBPP = PromiseBQP 
implies SampP = SampBQP. Translated into "physics language," the question is this: suppose we 
had an efficient classical algorithm to estimate the expectation value of any observable in quantum 
mechanics. Would that imply an efficient classical algorithm to simulate any quantum experiment, 
in the sense of sampling from a probability distribution close to the one quantum mechanics pre- 
dicts? The difficulty is that, if we consider a quantum system of n particles, then a measurement 
could in general have c n possible outcomes, each with probability on the order of c~ n . So, even 
supposing we could estimate any given probability to within ±e, in time polynomial in n and 1/e, 
that would seem to be of little help for the sampling task. 

1.2 Our Results 

This paper shows that two of the four types of problem discussed above — namely, sampling problems 
and search problems — are essentially equivalent. More precisely, given any sampling problem S, we 
will construct a search problem R = R$ such that, if C is any "reasonable" model of computation, 
then S is in SampC (the sampling version of C) if and only if R is in FC (the search version of C). 
Here is a more formal statement of the result: 

Theorem 5 (Sampling/ Searching Equivalence Theorem) Let S be any sampling problem. 
Then there exists a search problem Rs such that 

(i) IfO is any oracle for S, then R s G FBPP°. 

(ii) If B is any probabilistic Turing machine solving Rs, then S € SampP 5 . 

As one application, we show that the "obvious" implication SampP = SampBQP FBPP = 
FBQP can be reversed: 

Theorem 6 FBPP = FBQP if and only if SampP = SampBQP. In other words, classical comput- 
ers can efficiently solve every FBQP search problem, if and only if they can approximately sample 
the output distribution of every quantum circuit. 
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As a second application (which was actually the original motivation for this work), we are able 
to extend a recent result of Aaronson and Arkhipov pQ. These authors give a sampling problem 
that is solvable using a simple linear-optics experiment (so in particular, in SampBQP), but is not 
solvable efficiently by a classical computer, unless the permanent of a Gaussian random matrix can 
be approximated in BPP NP . More formally, consider the following problem, called |GPE| 2 (the 
GPE stands for Gaussian Permanent Estimation): 

Problem 7 (|GPE| 2 ) Given an input of the form (X, 1/,£ , 1 / 5 ), where X £ 
matrix of independent A/"(0, 1) Gaussians, output a real number y such that 



y-\Pei(X)\< 



< e ■ n\, 



with probability at least 1 — 5 over both X ~ Af (0, l)£ Xn and any internal randomness used by the 
algorithm. 

Here 1//e and 1//5 represent the numbers 1/e and 1/5 respectively encoded in unary; such unary 
encoding is a standard trick for forcing an algorithm's running time to be polynomial in 1/e and 
1/5 as well as n. 

The main result of PQ is the following: 

Theorem 8 (Aaronson and Arkhipov SampP = SampBQP implies |GPE[ 2 € FBPP NP . 

Note that Theorem [8] relativizes: for all oracles O, if SampBQP C SampBPP , then |GPE| 2 G 
FBPP NP °. 

The central conjecture made in [TJ is that estimating |Per (X)| 2 is as hard for a Gaussian random 
matrix X as it is for an arbitrary matrix X € C nxra : 

Conjecture 9 ([!]) |GPE| 2 is ffP-complete. 

Much of [I] is devoted to giving evidence for Conjecture M 

Notice that, if Conjecture [9] holds, then combining it with Theorem [HI we find that SampP = 
SampBQP implies P #p = BPP NP (which in turn implies PH = BPP NP by Toda's Theorem [9]). Or 
to put it differently: assuming Conjecture El there can be no polynomial-time classical algorithm to 
sample (even approximately) the output distribution of quantum circuits in general, or the linear- 
optics experiment of p] in particular, unless the polynomial hierarchy collapses to BPP NP . This 
can be taken as a surprising new form of evidence against the Extended Church- Turing Thesis — 
assuming, of course, that one is willing to state the ECT in terms of sampling problems. 

Now, by using Theorem [6] from this paper, we can deduce, in a completely "automatic" way, 
that the counterpart of Theorem [8j holds with search problems in place of sampling problems: 



Corollary 10 FBPP = FBQP implies [GPE| 2 € FBPP NP . So in particular, assuming |GPE 
ffV -complete and PH is infinite, it follows that FBPP ^ FBQP. 



is 



Indeed, assuming \GPEf is #P-complete, we cannot even have FBQP C FBPP PH , unless P #p = 
PH and the polynomial hierarchy collapses. To strengthen Corollary 1 101 still further, notice that one 
can replace FBQP by "the class of search problems efficiently solvable with the help of a linear-optics 
computer," which is almost certainly a proper subclass of FBQP. 
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1.3 Proof Overview 



Let us explain the basic difficulty we need to overcome to prove Theorem [5) Given a probability 
distribution V x over {0, iy ol ^ n \ we want to define a set A x C {0, i}P° 1 y( n ) ) suc h that the ability 
to find an element of A x is equivalent to the ability to sample from T> x . At first glance, such a 
general reduction seems impossible. For let R = {A x } x be the search problem in which the goal 
is to find an element of A x given x. Then consider an oracle O that, on input x, returns the 
lexicographically first element of A x . Such an oracle O certainly solves R, but it seems useless if 
our goal is to sample uniformly from the set A x (or indeed, from any other interesting distribution 
related to A x ). 

Our solution will require going outside the black-box reduction paradigmH In other words, 
given a sampling problem S = {T> x } , we do not show that S E SampP , where O is any oracle 
that solves the corresponding search problem R$. Instead, we use the fact that O is computed by 
a Turing machine. We then define Rs in such a way that O must return, not just any element in 
the support of T> x , but an element with near-maximal Kolmogorov complexity. 

The idea here is simple: if a Turing machine B is probabilistic, then it can certainly output a 
string x with high Kolmogorov complexity, by just generating x at random. But the converse also 
holds: if B outputs a string x with high Kolmogorov complexity, then x must have been generated 
randomly. For otherwise, the code of B would constitute a succinct description of x. 

Given any set A C {0, l} n , it is not hard to use the above "Kolmogorov trick" to force a 
probabilistic Turing machine B to sample almost-uniformly from A. We simply ask B to produce 
k samples x%, . . . , Xk G A, for some k = poly (n), such that the tuple (x±, . . . , Xk) has Kolmogorov 
complexity close to k\og 2 \ A\. Then we output Xi for a uniformly random i £ [k]. 

However, one can also generalize the idea, to force B to sample from an arbitrary distribution 
T>, not necessarily uniform. One way of doing this would be to reduce to the uniform case, 
by dividing the support of T> into poly (n) "buckets," such that T> is nearly-uniform within each 
bucket, and then asking B to output Kolmogorov-random elements in each bucket. In this paper, 
however, we will follow a more direct approach, which exploits the beautiful known connection 
between Kolmogorov complexity and Shannon information. In particular, we will use the notion 
of a universal randomness test from algorithmic information theory [fHH]. Let U be the "universal 
prior," in which each string x € {0, 1}* occurs with probability proportional to 2~ K<yX \ where K (x) 
is the prefix- free Kolmogorov complexity of x. Then given any computable distribution T> and 
fixed string x, the universal randomness test provides a way to decide whether x was "plausibly 
drawn from by considering the ratio Pr© [x] / Pr^ [x]. The main technical fact we need to prove 
is simply that such a test can be applied in our complexity-theoretic context, where we care (for 
example) that the number of samples from T> scales polynomially with the inverses of the relevant 
error parameters. 

From one perspective, our result represents a surprising use of Kolmogorov complexity in the 
seemingly "distant" realm of polynomial-time reductions. Let us stress that we are not using 
Kolmogorov complexity as just a technical convenience, or as shorthand for a counting argument. 
Rather, Kolmogorov complexity seems essential even to define a search problem Rs with the prop- 
erties we need. From another perspective, however, our use of Kolmogorov complexity is close in 
spirit to the reasons why Kolmogorov complexity was defined and studied in the first place! The 
whole point, after all, is to be able to talk about the "randomness of an individual object," without 

3 This was previously done for different reasons in a cryptographic context — see for example Barak's beautiful PhD 
thesis [2]. 
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reference to any distribution from which the object was drawn. And that is exactly what we need, 
if we want to achieve the "paradoxical" goal of sampling from a distribution, using an oracle that 
is guaranteed only to output a fixed string x with specified properties. 

2 Preliminaries 

2.1 Sampling and Search Problems 

We first formally define sampling problems, as well as the complexity classes SampP and SampBQP 
of sampling problems that are efficiently solvable by classical and quantum computers respectively. 

Definition 11 (Sampling Problems, SampP, and SampBQP) A sampling problem S is a col- 
lection of probability distributions (T>x) x e{o jj* , one for each input string x € {0, 1} U , where V x 

is a distribution over {0, l} p<yTl \ for some fixed polynomial p. Then SampP is the class of sam- 
pling problems S = (T> x ) xe ^ n» for which there exists a probabilistic polynomial-time algorithm B 
that, given (x, O 1 ' 5 ) as input, samples from a probability distribution C x such that \\C X — T> x \\ < e. 
SampBQP is defined the same way, except that B is a quantum algorithm rather than a classical 
one. 

Let us also define search problems, as well as the complexity classes FBPP and FBQP of search 
problems that are efficiently solvable by classical and quantum computers respectively. 

Definition 12 (Search Problems, FBPP, and FBQP) A search problem R is a collection of 
nonempty sets (A x ) xe f Q jy*, one for each input string x £ {0, l} n , where A x C {0, l} p ( n ) for some 
fixed polynomial p. Then FBPP is the class of search problems R = (A x ) xl -^ ji* for which there 
exists a probabilistic polynomial-time algorithm B that, given an input x £ {0, l} n together with 
1 / £ ) produces an output y such that 

Pr [y € A x ] > 1 - e, 

where the probability is over B 's internal randomness. FBQP is defined the same way, except that 
B is a quantum algorithm rather than a classical one. 

2.2 Algorithmic Information Theory 

We now review some basic definitions and results from the theory of Kolmogorov complexity. Recall 
that a set of strings P C {0, 1}* is called prefix-free if no x E P is a prefix of any other y € P. 

Definition 13 (Kolmogorov complexity) Fix a universal Turing machine U, such that the set 
of valid programs of U is prefix-free. Then K (y), or the prefix-free Kolmogorov complexity of y, 
is the minimum length of a program x such that U (x) = y. We can also define the conditional 
Kolmogorov complexity K (y\z), as the minimum length of a program x such that U ((x,z)) = y. 

We are going to need two basic lemmas that relate Kolmogorov complexity to standard infor- 
mation theory, and that can be found in the book of Li and Vitanyi [6] for example. The first 
lemma follows almost immediately from Shannon's noiseless channel coding theorem. 
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Lemma 14 Let T> = {p x } be any computable distribution over strings, and let x be any element in 
the support ofT>. Then 

K(x)<log 2 - + K(V) + 0(l), 

Px 

where K (T>) represents the length of the shortest program to sample from T>. The same holds if 
we replace K (x) and K (V) by K (x\y) and K (T>\y) respectively, for any fixed y. 

The next lemma follows from a counting argument. 

Lemma 15 (|6j) Let T> = {p x } be any distribution over strings (not necessarily computable). 
Then there exists a universal constant b such that 



Pr 



K (x) > log 2 — - c 

Px 



b 

> 1 . 

2 C 



The same holds if we replace K (x) by K (x\y) for any fixed y. 
2.3 Information Theory 

This section reviews some basic definitions and facts from information theory. Let A = {p x } x 
and B = {q x } x be two probability distributions over [N]. Then recall that the variation distance 
between A and B is defined as 



i N 

\A-B\\ :=j5> 



i=l 

while the KL- divergence is 

N 

D KL {A\\B) :=Y^P x log 2 ^. 
The variation distance and the KL-divergence are related as follows: 



Proposition 16 (Pinsker's Inequality) \\A - B\\ < ^2D KL (A\\B). 

We will also need a fact about KL-divergence that has been useful in the study of parallel 
repetition, and that can be found (for example) in a paper by Rao [7]. 

Proposition 17 ([7]) Let 1Z be a distribution over [N] k , with marginal distribution IZi on the i th 
coordinate. Let T> be a distribution over [N] . Then 



Y, d kl (ni\\v)<D KL (n\\v k ) 



i=i 
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3 Main Result 



Let S = {T> x } x be a sampling problem. Then our goal is to construct a search problem R = 
Rs = {A x } x that is "equivalent" to S. Given an input of the form (x, O 1 / 6 ), the goal in the search 
problem will be to produce an output Y such that Y G A Xt g, with success probability at least 1 — 5. 
The running time should be poly (n, 1/5). 

Fix an input x € {0, l} n , and let T> := T> x be the corresponding probability distribution over 
{0, l} m . Let p y := Pr© [y] be the probability of y. We now define the search problem R. Let 
N := m/5 2 ' 1 , and let Y = (yi, . . . ,Vn) be an iV-tuple of m-bit strings. Then we set Y € A Xj $ if 
and only if 



log 2 



1 



<K(Y | x,8)+p, 



Pyi ' ' ' Pvn 
where /3 := 1 + log 2 1/5. 

The first thing we need to show is that any algorithm that solves the sampling problem S also 
solves the search problem R with high probability. 

Lemma 18 Let C = C x be any distribution over {0, l} m such that \\C — T>\\ < e. Then 



V JcN [YiA x ,]<eN + w 



Proof. We have 



N 



Pr [Y<£A X}S ]< Pr [Y<£A X>S ] + \\C 
< Pr [Y <£ A x>5 ] + eN. 

Y~V N 



V 



N 



So it suffices to consider a Y drawn from T> . By Lemma [T5l 



Pr 

Y~V N 



K{Y | x,5) > log 2 



1 



Pyi ' ' ' Pyi> 



Therefore 



Pr \Y 4A x5 \<- s , 



and we are done. ■ 

The second thing we need to show is that any algorithm that solves the search problem R also 
samples from a distribution that is close to V in variation distance. 

Lemma 19 Let B be a probabilistic Turing machine, which given input (x, outputs an N- 

tuple Y = (yi, . . . , 2/jy) of m-bit strings. Suppose that 



Pr 



B (x,0 1/s ) G A 



> 1 



where the probability is over B 's internal randomness. Let 1Z = 7Z X be the distribution over outputs 
of B (x), and let C = C x be the distribution over {0, l} rn that is obtained by from 1Z by choosing 
one of the yi 's uniformly at random. Then there exists a constant Qb, depending on B, such that 

\\C-V\\<5 + Q B ^. 
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Proof. Let 1Z' be a distribution that is identical to 1Z, except that we condition on B (x, 1 /" 5 ) € 
A Xj g. Then by hypothesis, \\TZ — < 6. Now let 'R! i be the marginal distribution of 1Z' on the 
i th coordinate, and let 



1 - 

N ^ 1 



i=l 

be the distribution over {0, l} m that is obtained from 1Z' by choosing one of the y^s uniformly at 
random. Then clearly \\C — C'\\ < S as well. So by the triangle inequality, 

||C-D|| < ||C-C'|| + ||C'-X>|| 
<5+\\C -V\\ , 

and it suffices to upper-bound \\C — V\\. 
Let qy := Pr^/ [¥]. Then by Lemma [U 

K(Y | x,5) < log 2 i- + K (W) + O (1) 
for all Y G ({0, l} m ) . Also, since Y G A X) $, by assumption we have 

log 2 <K(Y | x,S)+p. 

Pyi ' ' ' Pun 

Combining, 

log 2 < log 2 — + K (R!) + O (1) + p. 

This implies the following upper bound on the KL-divergence: 

Qy 



D KL (K'\\V N ) = ^ lo ' 

Qy 



>2 



< maxlog 2 

Y Pyi ' ' ' Pvn 

< K (K') +0(1)+ /3. 



So by Proposition I17[ 

N 



Y, d kl {-Rli\\v) <d kl (n'\\v N ) <K(n')+o(i) + p, 



i=i 



and by Proposition [TBI 



So by Cauchy-Schwarz, 



1 N 

-^W^-Vf <K{K>)+0(l)+f3. 



i=i 



v 



Yj \\ n 'i - V \\ ^ ^/N(2p + 2K {W) + 0{1)). 
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Hence 

>2f3 + 2K (ft') + 0(1) 



\C -v\\ < 



N 
and 

\\C-V\\ < \\C-C'\\ + \\c'-v\ 



: . + J2P + 2K(K')+0(1) 



N 



<6 + Q B] J^, 



for some constant Qb depending on5. ■ 

By combining Lemmas 1181 and [T9l we can now prove Theorem [5} that for any sampling problem 
S = (Pij^gjo i}* (where V x is a distribution over m = m (n)-bit strings), there exists a search 
problem R$ = {A x ) xG ^ Q 1 |» that is "equivalent" to S in the following two senses. 

(i) Let O be any oracle that, given (x, O 1 / 2 ^) as input, outputs a sample from a distribution C x 
such that \\C X — T> x \\ < e, as we vary the random string r. Then R$ G FBPP . 

(ii) Let B be any probabilistic Turing machine that, given as input, outputs a Y € 
({0, 1}™)^ such that Y G with probability at least 1-6. Then S G SampP 5 . 

Proof of Theorem [5] (Sampling/ Searching Equivalence Theorem). For part (i), given an 
input (x,0 1//<s ), suppose we want to output an iV-tuple Y = {yi, . . . , yjv) G ({0, l} m ) such that 
Y G A x s, with success probability at least 1 — 5. Recall that N = m/5 2,1 . Then the algorithm is 
this: 

(1) Set e := 277 = 

(2) Call O on inputs (a^O 1 ^ , r\j , . . . , (x, O 1 / 5 , r^v), where ri,...,r^ are independent random 
strings, and output the result as Y = (y±, . . . , yjv)- 

Clearly this algorithm runs in poly (n, 1/6) time. Furthermore, by Lemma 1181 its failure 
probability is at most 

eN + ^<6. 

For part (ii), given an input ^XjO 1 ^), suppose we want to sample from a distribution C x such 
that \\C X — T> x \\ < e. Then the algorithm is this: 

(1) Set 6 := e/2, so that N = m/6 2A = Q (m/e 2A ). 

(2) Call B on input (x, O 1 /" 5 ), and let Y = (yi, . . . , y^) be i3's output. 

(3) Choose i G [N] uniformly at random, and output yi as the sample from C x . 
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Clearly this algorithm runs in poly (n, 1/e) time. Furthermore, by Lemma [19] we have 

\\C X -V X \\ <5 + Q B 

e / £ 2.i(2 + logl/ £ ) " 

~2 +QB i m ' 

for some constant Qb depending only on B. So in particular, there exists a constant Cb such that 
\\C X — "D x \\ < e for all m > Cb- For m < Cb, we can simply hardwire a description of T> x for every 
x into the algorithm (note that the algorithm can depend on B; we do not need a single algorithm 
that works for all -B's simultaneously). ■ 

In particular, Theorem [5] means that S G SampP if and only if Rs G FBPP, and likewise 
S G SampBQP if and only if Rs G FBQP, and so on for any model of computation that is "below 
recursive" (i.e., simulable by a Turing machine) and has the extremely simple closure properties 
used in the proof. 




3.1 Implication for Quantum Computing 

We now apply Theorem to prove Theorem El that SampP = SampBQP if and only if FBPP = 
FBQP. 

Proof of Theorem [6J. First, suppose SampP = SampBQP. Then consider a search problem 
R = (A x ) x in FBQP. By assumption, there exists a polynomial-time quantum algorithm Q that, 
given ^O 1 / 5 ) as input, outputs a y such that y G A x with probability at least 1 — 5. Let T> x § be 
the probability distribution over y's output by Q on input ^XjO 1 ^). Then to solve R in FBPP, 
clearly it suffices to sample approximately from 2X j( 5 in classical polynomial time. But we can do 
this by the assumption that SampP = SampBQPo 

Second, suppose FBPP = FBQP. Then consider a sampling problem S in SampBQP. By 
Theorem [5j we can define a search counterpart Rs of S, such that 

S G SampBQP ==> Rs G FBQP 
==> Rs G FBPP 
S G SampP. 

Hence SampP = SampBQP. ■ 

Theorem [6] is easily seen to relativize: for all oracles A, we have SampP A = SampBQP A if and 
only if FBPP" 4 = FBQP A . (Of course, when proving a relativized version of Theorem [5j we have 
to be careful to define the search problem Rs using Kolmogorov complexity for Turing machines 
with A-oracles.) 



4 Extensions and Open Problems 

4.1 Equivalence of Sampling and Decision Problems? 

Perhaps the most interesting question we leave open is whether any nontrivial equivalence holds 
between sampling (or search) problems on the one hand, and decision or promise problems on the 

4 As mentioned in Section[TJ the same argument shows that SampP = SampBQP (or equivalently, FBPP = FBQP) 
implies BPP = BQP. However, the converse is far from clear: we have no idea whether BPP = BQP implies 
SampP = SampBQP. 
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other. In Theorem [5j it was certainly essential to consider large numbers of outputs; we would 
have no idea how to prove an analogous result with a promise problem Pg or language L$ instead 
of the search problem R$. 

One way to approach this question is as follows: does there exist a sampling problem S that is 
provably not equivalent to any decision problem, in the sense that for every language L C {0, 1}*, 
either S ^ SampP 1 ', or else there exists an oracle O solving S such that L ^ BPP ? What if we 
require the oracle O to be computable? As far as we know, these questions are open. 

One might object that, given any sampling problem S, it is easy to define a language L$ 
that is "equivalent" to S, by using the following simple enumeration trick. Let M\,M%, . . . be 
an enumeration of probabilistic Turing machines with polynomial-time alarm clocks. Given a 
sampling problem S = {T^x) x& \Qiy and an input X = (x,0 1//£ ), say that M t succeeds on X if 
Mi (X) samples from a distribution Cx such that \\Cx — T^x\\ < £• Also, if x is an n-bit string, 
define the length of X = (x,0 1/e ) to be I (X) := n + 1/e. 

We now define a language L$ C {0, 1}*. For all n, let M t r n \ be the lexicographically first M t 
that succeeds on all inputs X such that I (X) < n. Then for all y G {0, l} ra , we set y G L$ if and 
only if the Turing machine encoded by y halts in at most n t<yTl ^ steps when run on a blank tape. 

Proposition 20 S G SampP if and only if L$ G P. 

Proof. First suppose S G SampP. Then there exists a polynomial-time Turing machine that 
succeeds on every input X = (x,{) l l £ ^. Let M± be the lexicographically first such machine. Then 
it is not hard to see that L$ consists of a finite prefix, followed by the n*-time bounded halting 
problem. Hence L$ G P. 

Next suppose S £ SampP. Then no machine Mt succeeds on every input X, so t (n) grows 
without bound as a function of n. By standard diagonalization arguments, the n*(™)-time bounded 
halting problem is not in P for any t that grows without bound, regardless of whether t is time- 
constructible. Therefore Lg ^ P. ■ 

Admittedly, Proposition [20] feels like cheating — but why exactly is it cheating? Notice that we 
did give a procedure to decide whether y G Lg for any input y. This fact makes Proposition 1201 at 
least somewhat more interesting than the "tautological" way to ensure S G SampP <^=^» L5 G P: 

"Take L5 to be the empty language if S G SampP, or an EXP-complete language if 
S i SampP!" 

In our view, the real problem with Proposition [20] is that it uses enumeration of Turing machines 
to avoid the need to reduce the sampling problem S to the language Lg or vice versa. Of course, 
Theorem [5] did not quite reduce S to the search problem R$ either. However, Theorem [5] came 
"close enough" to giving a reduction that we were able to use it to derive interesting consequences 
for complexity theory, such as SampP = SampBQP if and only ifFBPP = FBQP. If we attempted 
to prove similar consequences from Proposition 1201 then we would end up with a different language 
L$, depending on whether our starting assumption was S G SampP, S G SampBQP, or some other 
assumption. By contrast, Theorem [5] constructed a single search problem R$ that is equivalent to 
S in the classical model, the quantum model, and every other "reasonable" computational model. 

4.2 Was Kolmogorov Complexity Necessary? 

Could we have proved Theorem [5] without using Kolmogorov complexity or anything like it, and 
without making a computability assumption on the oracle for i?g? One way to formalize this 
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question is to ask the analogue of our question from Section 14.11 but this time for sampling versus 
search problems. In other words, does there exist a sampling problem S such that, for every search 
problem R, either there exists an oracle O solving S such that R ^ FBPP , or there exists an oracle 
O solving R such that S £ SampP ? Notice that, if R is the search problem from Theorem [5j 
then the latter oracle (if it exists) must be uncomputable. Thus, we are essentially asking whether 
the computability assumption in Theorem [5] was necessary. 



4.3 Prom Search Problems to Sampling Problems 

Theorem [5] showed how to take any sampling problem S, and define a search problem R = R$ that 
is equivalent to S. Can one go the other direction? That is, given a search problem R, can one 
define a sampling problem S = Sr that is equivalent to R? The following theorem is the best we 
were able to find in this direction. 

Theorem 21 Let R = {A x ) x be any search problem. Then there exists a sampling problem Sr = 
{T) x } that is "almost equivalent" to R, in the following senses. 

(i) If O is any oracle solving Sr, then R £ FBPP . 

(ii) If B is any probabilistic Turing machine solving R, then there exists a constant rjR > such 
that a SampP 5 machine can sample from a probability distribution C x with \\C X — V x \\ < I—vr- 

Proof. Let IA X be the universal prior, in which every string y occurs with probability at least 
c • 2~ K ( y \ x \ for some constant c > 0. Then to define the sampling problem Sr, we let T> x be the 
distribution obtained by drawing y ~ IA X and then conditioning on the event y £ A x . (Note that 
T> x is well-defined, since U x assigns nonzero probability to every y.) 

For (i), notice that V x has support only on A x . So if we can sample a distribution C x such 
that \\C X — T> x \\ < e, then certainly we can output an element of A x with probability at least 1 — e. 

For (ii), let C Xi $ be the distribution over values of B (x, 1//<5 , r) induced by varying the random 
string r. Then we claim that \\C Xj $ — T> x \\ < 1 — (1), so long as 5 < Ar for some constant A# 
depending on B. To see this, first let C be the distribution obtained by drawing y ~ C Xt s and then 
conditioning on the event y € A x . Then since Pr^e^ s [y £ A x ] > 1 — 5, we have \\C — C Xi g\\ < 5. 

Now let q y := Pre [y]. Then by Lemma [T4l there exists a constant gg depending on B such 
that 

q y < 9B ■ 2' K ^ 

for all y € A x . On the other hand, let p y := Prx^ [y] and u y := Pr^ [y]. Then there exists a 
constant a > 1 such that p y = au y if y £ A x and p y = otherwise. So 

Py>u y >c- 2~ K ^ 

for all y € A x . Hence p y > -^Qy, so 

11^-^11= £ \Py~1y\ 

yeA x : p y <q y 

<1-^. 

9B 
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Therefore 



\Cx,s — V x \\ < \\C Xt s — C'\\ + \\C — V. 

c 

9B 



x 



<!- — + *, 



which is 1 — Q.b (1) provided 5 < ^J^- ■ 

We see it as an interesting problem whether Theorem 1 2 1 1 still holds with the condition \\C X — T> x \\ < 
1 — rjB replaced by \\C X — V x \\ < e (in other words, with Sr € SampP B ). 



4.4 Making the Search Problem Checkable 

One obvious disadvantage of Theorem [5] is that the search problem R = (A x ) x is defined using 
Kolmogorov complexity, which is uncomputable. In particular, there is no algorithm to decide 
whether y £ A x . However, it is not hard to fix this problem, by replacing the Kolmogorov 
complexity with the time-bounded or space-bounded Kolmogorov complexities in our definition of 
R. The price is that we then also have to assume a complexity bound on the Turing machine B 
in the statement of Theorem [5j In more detail: 

Theorem 22 Let S be any sampling problem, and let f be a time-constructible function. Then 
there exists a search problem R$ = (A x ) x such that 

(i) If O is any oracle solving S, then R$ € FBPP . 

(ii) If B is any BPTIME (/ (re)) Turing machine solving Rs, then S € SampP 5 . 

(Hi) There exists a SPACE (/ (n) + n W) algorithm to decide whether y G A x , given x and y. 

Proof Sketch. The proof is almost the same as the proof of Theorem [5j Let T := f (n) + n ^\ 
and given a string y, let -Kspace(t) (v) be the T-space bounded Kolmogorov complexity of y. Then 
the only real difference is that, when defining the search problem Rs, we replace the conditional 
Kolmogorov complexity K (Y \ x,5) by the space-bounded complexity -Kspace(t) {X I x,S). This 
ensures that property (hi) holds. 

Certainly property (i) still holds, since it only used the fact that there are few tuples Y € 
({0, l} m ) N with small Kolmogorov complexity, and that is still true for space-bounded Kolmogorov 
complexity. 

For property (ii), it suffices to observe that Lemma [bH has the following "effective" version. Let 
T> = {p y } be any distribution over strings that is samplable in BPTIME (/ (n)), and let y be any 
element in the support of V. Then there exists a constant C-p, depending on V, such that 

-Kspace(t) (y) < log 2 — + C v - 

Py 

The proof is simply to notice that, in SPACE (/ (n) + n ^ 1 )) , we can compute the probability p y of 
every y in the support of T>, and can therefore recover any particular string y from its Shannon- Fano 
code. This means that the analogue of Lemma [191 goes through, as long as B is a BPTIME (/ (n)) 
machine. ■ 

In Theorem [22l how far can we decrease the computational complexity of -Rs? It is not hard 
to replace the upper bound of SPACE (/ (n) + n°«) by CH (/ (n) + n *- 1 ') (where CH denotes the 
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counting hierarchy), but can we go further? It seems unlikely that one could check in NP (or 
NTIME(/(n) + n°( 1 ))) whether y € A x , for a search problem Rs = {A x } x equivalent to S, but 
can we give formal evidence against this possibility? 
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